Skip to content

add mcp example to llama stack docs#168

Merged
davidwtf merged 4 commits intomasterfrom
fix/AI-23784
Mar 26, 2026
Merged

add mcp example to llama stack docs#168
davidwtf merged 4 commits intomasterfrom
fix/AI-23784

Conversation

@davidwtf
Copy link
Copy Markdown
Contributor

@davidwtf davidwtf commented Mar 25, 2026

Summary by CodeRabbit

  • Documentation
    • Install now targets vLLM OpenAI-compatible endpoints via VLLM_URL; VLLM_API_TOKEN is optional (example commented); manifest example uses in-cluster vLLM base URL.
    • Default storage reduced from 20Gi to 1Gi.
    • Quickstart condensed: adds two tool-integration paths (client-side and MCP/FastMCP), a shared agent flow, dual tool-calling selector, and improved init/cleanup and error handling.
    • Added vLLM tool-calling guidance for KServe (container args and parser selection).
    • Removed separate MCP-only notebook; integrated MCP content into main quickstart.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79eec8d1-c4ee-44b0-92dd-d600d76490e4

📥 Commits

Reviewing files that changed from the base of the PR and between d26eef2 and e807a06.

📒 Files selected for processing (1)
  • docs/public/llama-stack/llama-stack_quickstart.ipynb
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/public/llama-stack/llama-stack_quickstart.ipynb

Walkthrough

This PR updates Llama Stack docs and notebooks to replace third‑party LLM API secret guidance with vLLM OpenAI‑compatible endpoints (VLLM_URL), add vLLM/KServe tool‑calling guidance and predictor args, consolidate MCP examples into the main quickstart, and refactor the quickstart notebook to support dual tool integration paths (client-side and MCP).

Changes

Cohort / File(s) Summary
Install & vLLM config
docs/en/llama_stack/install.mdx
Replace provider API-secret guidance with VLLM_URL (example in-cluster http://.../v1), make VLLM_API_TOKEN optional (commented secret), reduce PVC from 20Gi1Gi, and add KServe/vLLM predictor container args and parser guidance for tool-calling (--enable-auto-tool-choice, --tool-call-parser).
Quickstart docs
docs/en/llama_stack/quickstart.mdx
Update prerequisites to require VLLM_URL; condense steps and emphasize two tool integration paths (local @client_tool and MCP via FastMCP + toolgroups.register) with a shared agent flow and optional FastAPI deployment.
Main quickstart notebook (refactor)
docs/public/llama-stack/llama-stack_quickstart.ipynb
Large refactor: add TOOL_OPTION selector, implement dual tool paths (client vs FastMCP), optional FastMCP bootstrap with readiness probing, conditional MCP toolgroup registration (provider_id="model-context-protocol"), shared AGENT_TOOLS, improved error handling and cleanup (terminate MCP process, clear MCP_SERVER_URL), add fastmcp dependency and troubleshooting cells.
Removed MCP‑only notebook
docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb
Deleted standalone MCP quickstart notebook; its examples consolidated into the main quickstart notebook under conditional cells.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Notebook
  participant FastMCP
  participant Client as LlamaStackClient
  participant vLLM as vLLM/KServe
  rect rgba(200,230,255,0.5)
    Note over Notebook,FastMCP: TOOL_OPTION selects tool integration path
  end
  User->>Notebook: run notebook (choose TOOL_OPTION)
  alt Client-side tool
    Notebook->>Client: register local client tool (`@client_tool`) -> AGENT_TOOLS
    Client->>vLLM: send agent/model request (may include tool call)
    vLLM-->>Client: model response / tool invocation result
    Client-->>Notebook: stream results
  else MCP tool (FastMCP)
    Notebook->>FastMCP: spawn MCP server (background process)
    Notebook->>Client: register MCP toolgroup (provider_id=model-context-protocol) -> AGENT_TOOLS
    Client->>FastMCP: invoke MCP tool (tool metadata)
    FastMCP->>vLLM: forward/coordinate model calls
    vLLM-->>FastMCP: model response
    FastMCP-->>Client: tool result
    Client-->>Notebook: stream results
  end
  Notebook->>FastMCP: cleanup (terminate server)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • zhaomingkun1030
  • typhoonzero

Poem

🐰 I hopped through docs and notebooks bright,
Two tool paths tucked in one cozy light.
FastMCP hums, client tools sing,
Cleanup hops in at the end of the ring.
Cheers — a rabbit's joyful byte!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'add mcp example to llama stack docs' partially describes the changeset. While MCP is mentioned, the title does not capture the broader scope: the PR consolidates two separate MCP notebooks into one, restructures the quickstart guide, and updates installation documentation with vLLM configuration changes. Consider a more descriptive title such as 'Consolidate MCP examples and update LlamaStack docs for vLLM integration' to better reflect the full scope of documentation changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/AI-23784

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.15.6)
docs/public/llama-stack/llama-stack_quickstart.ipynb

Unexpected end of JSON input


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
docs/public/llama-stack/llama-stack_quickstart.ipynb (3)

197-214: Consider adding a readiness check instead of a fixed sleep.

The time.sleep(1.5) at line 203 is a race condition—the MCP server may not be ready to accept connections yet, especially on slower systems. Consider polling the endpoint for readiness.

💡 Suggested improvement for server readiness
 mcp_process = Process(target=_run_mcp_weather_server, daemon=True)
 mcp_process.start()

 import socket
 import time

-time.sleep(1.5)
+# Wait for server to be ready (up to 10 seconds)
+for _ in range(20):
+    time.sleep(0.5)
+    try:
+        with socket.create_connection(("127.0.0.1", 8002), timeout=1):
+            break
+    except OSError:
+        pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/public/llama-stack/llama-stack_quickstart.ipynb` around lines 197 - 214,
Replace the fixed sleep that waits for the MCP server with an active readiness
poll: after starting mcp_process (the Process created with target
_run_mcp_weather_server), repeatedly attempt an HTTP GET to the MCP endpoint
(use MCP_SERVER_URL if set or construct http://{host}:8002/mcp like the current
logic) with a short timeout and small backoff, stop polling when you receive a
successful response (2xx) or after a reasonable overall timeout, and only then
set os.environ["MCP_SERVER_URL"] and print the success message; keep the same
host/port (/mcp) and error/timeout handling so the notebook doesn’t hang
indefinitely.

289-298: Redundant import and client initialization.

LlamaStackClient is imported and client is initialized earlier at lines 91-96. This cell re-imports LlamaStackClient and Agent, but then uses the existing client variable from the earlier cell. Consider removing the redundant import or clarifying that cells are designed to be run independently.

Note: If the intent is cell independence for users who skip Option A/B setup cells, this is acceptable but could be documented.

💡 Suggested simplification
 import os
-from llama_stack_client import LlamaStackClient, Agent
+from llama_stack_client import Agent


 if "AGENT_TOOLS" not in globals():
     raise RuntimeError("AGENT_TOOLS is missing. Run the matching setup cell(s) in section 2 for the selected TOOL_OPTION.")

 base_url = os.getenv("LLAMA_STACK_URL", "http://localhost:8321")
 print(f"Connecting to Server: {base_url}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/public/llama-stack/llama-stack_quickstart.ipynb` around lines 289 - 298,
The cell re-imports LlamaStackClient and Agent and references the
previously-created client, causing redundancy or confusion; either remove the
duplicate imports and rely on the earlier client variable (keep AGENT_TOOLS and
base_url handling) or make the cell self-contained by checking/creating a client
when missing (e.g., test for existing client variable and call
LlamaStackClient(...) to initialize it) so references to LlamaStackClient,
Agent, client, AGENT_TOOLS and base_url are consistent; update cell text to
document the chosen approach (dependent vs standalone cells).

205-210: IP detection may be unreliable in containerized environments.

The socket.gethostbyname(socket.gethostname()) approach may return unexpected results in containers or when multiple network interfaces exist. The fallback to MCP_SERVER_HOST env var is good, but consider documenting this limitation more prominently or defaulting to 127.0.0.1 with a clearer note.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/public/llama-stack/llama-stack_quickstart.ipynb` around lines 205 - 210,
The IP autodetection using socket.gethostbyname(socket.gethostname()) to set
MCP_SERVER_URL is unreliable in containers/multi-interface hosts; update the
logic around MCP_SERVER_URL so it prefers an explicit MCP_SERVER_HOST env var
(MCP_SERVER_HOST) or defaults to 127.0.0.1 when MCP_SERVER_HOST is missing, and
add a brief inline comment or documentation note next to the MCP_SERVER_URL
assignment explaining the container/network limitation and instructing users to
set MCP_SERVER_HOST when running inside containers; reference the MCP_SERVER_URL
assignment and the socket.gethostbyname/socket.gethostname calls when making the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/public/llama-stack/llama-stack_quickstart.ipynb`:
- Around line 197-214: Replace the fixed sleep that waits for the MCP server
with an active readiness poll: after starting mcp_process (the Process created
with target _run_mcp_weather_server), repeatedly attempt an HTTP GET to the MCP
endpoint (use MCP_SERVER_URL if set or construct http://{host}:8002/mcp like the
current logic) with a short timeout and small backoff, stop polling when you
receive a successful response (2xx) or after a reasonable overall timeout, and
only then set os.environ["MCP_SERVER_URL"] and print the success message; keep
the same host/port (/mcp) and error/timeout handling so the notebook doesn’t
hang indefinitely.
- Around line 289-298: The cell re-imports LlamaStackClient and Agent and
references the previously-created client, causing redundancy or confusion;
either remove the duplicate imports and rely on the earlier client variable
(keep AGENT_TOOLS and base_url handling) or make the cell self-contained by
checking/creating a client when missing (e.g., test for existing client variable
and call LlamaStackClient(...) to initialize it) so references to
LlamaStackClient, Agent, client, AGENT_TOOLS and base_url are consistent; update
cell text to document the chosen approach (dependent vs standalone cells).
- Around line 205-210: The IP autodetection using
socket.gethostbyname(socket.gethostname()) to set MCP_SERVER_URL is unreliable
in containers/multi-interface hosts; update the logic around MCP_SERVER_URL so
it prefers an explicit MCP_SERVER_HOST env var (MCP_SERVER_HOST) or defaults to
127.0.0.1 when MCP_SERVER_HOST is missing, and add a brief inline comment or
documentation note next to the MCP_SERVER_URL assignment explaining the
container/network limitation and instructing users to set MCP_SERVER_HOST when
running inside containers; reference the MCP_SERVER_URL assignment and the
socket.gethostbyname/socket.gethostname calls when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a79c809c-a938-492e-9d3a-875a7b08b56f

📥 Commits

Reviewing files that changed from the base of the PR and between 8e7f560 and 9d54906.

📒 Files selected for processing (4)
  • docs/en/llama_stack/install.mdx
  • docs/en/llama_stack/quickstart.mdx
  • docs/public/llama-stack/llama-stack_quickstart.ipynb
  • docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb
💤 Files with no reviewable changes (1)
  • docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 25, 2026

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: e807a06
Status: ✅  Deploy successful!
Preview URL: https://2e569e01.alauda-ai.pages.dev
Branch Preview URL: https://fix-ai-23784.alauda-ai.pages.dev

View logs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/public/llama-stack/llama-stack_quickstart.ipynb`:
- Around line 233-240: The code currently overwrites
os.environ["MCP_SERVER_URL"] without preserving the caller's original value;
capture the original value (e.g., orig = os.environ.get("MCP_SERVER_URL"))
before setting MCP_SERVER_URL and set os.environ["MCP_SERVER_URL"] to the
computed value, then in the corresponding cleanup/teardown section restore the
environment to its prior state (if orig is None delete the key, otherwise set it
back to orig); apply the same capture-and-restore pattern for the other
occurrence referenced (the block around the later cleanup).
- Around line 104-106: The notebook can reuse a stale AGENT_TOOLS when
TOOL_OPTION is switched; update the TOOL_OPTION selector cell to reset/clear
AGENT_TOOLS (e.g., set AGENT_TOOLS = None or {}) whenever TOOL_OPTION is
assigned, and add a validation immediately before the agent construction step to
assert AGENT_TOOLS matches the chosen TOOL_OPTION (or raise an explicit error)
so reruns fail fast and the wrong backend is never used.
- Around line 114-116: The shared FastAPI test flow calls requests.post(...) but
the requests import is currently inside the Option A-only cell (which may be
skipped); move the requests import out of the Option A-only cell into a
common/non-optional setup cell (or add an explicit requests import in the API
test cell) so requests is always defined for the shared flow; update references
around get_weather and AGENT_TOOLS cells accordingly to ensure requests is
available regardless of TOOL_OPTION.
- Around line 183-241: The notebook spawns multiprocessing.Process with a nested
target (_run_mcp_weather_server) which is not picklable on spawn-based kernels
and uses a brittle time.sleep(5) readiness wait; move the server entrypoint out
of the cell into a top-level, importable function (e.g., module-level
run_mcp_weather_server) or launch the FastMCP server via subprocess instead of
Process, then replace the fixed sleep with an explicit readiness poll against
the MCP endpoint (MCP_SERVER_URL) with retry/backoff and timeout before setting
os.environ and printing; update usages of mcp_process, get_weather_mcp, and the
Process creation to reference the new top-level entrypoint.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d37c39e0-1029-46c4-8892-583e0d623e9e

📥 Commits

Reviewing files that changed from the base of the PR and between 9d54906 and 6e2721f.

📒 Files selected for processing (2)
  • docs/en/llama_stack/install.mdx
  • docs/public/llama-stack/llama-stack_quickstart.ipynb
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/en/llama_stack/install.mdx

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/public/llama-stack/llama-stack_quickstart.ipynb`:
- Around line 255-261: Wrap the call to
socket.gethostbyname(socket.gethostname()) in a try/except that catches
socket.gaierror and socket.herror (and optionally a broad Exception) and on
exception fall back to using os.getenv("MCP_SERVER_HOST", "127.0.0.1") for
_host; then proceed to build MCP_SERVER_URL as before (look for the
MCP_SERVER_URL assignment and the _host logic around socket.gethostbyname to
apply this change).
- Around line 296-309: AGENT_TOOLS is using an invalid dict format for MCP;
after calling client.toolgroups.register(...) you must fetch the actual tool
objects or identifiers and pass those to Agent(tools=...), e.g. call
client.tools.list(toolgroup_id=toolgroup_id) and assign AGENT_TOOLS to that
returned list (or build a list of explicit MCP identifiers like
["mcp::filesystem::read_file"]) so Agent(tools=AGENT_TOOLS) receives valid tool
entries instead of the current {"type":..., "server_label":...,
"server_url":...} structure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3519d066-0229-46b8-8681-c7fd9604a6a6

📥 Commits

Reviewing files that changed from the base of the PR and between 6e2721f and d26eef2.

📒 Files selected for processing (1)
  • docs/public/llama-stack/llama-stack_quickstart.ipynb

@xZhizhizhizhizhi
Copy link
Copy Markdown

/test-pass

@davidwtf davidwtf merged commit e1f395f into master Mar 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants